The Elephant and the Mouse: Non-Strict Fine-Grain Synchronization for Many-Core Architectures

نویسندگان

Juergen Ributzka

Yuhei Hayashi

Guang R. Gao

چکیده

A new synchronization mechanism created under the dataflow model of computation was introduced during the late 1970s and called I-Structure. I-Structure exhibited the following important features: (1) it is a dataflow style synchronization, i.e., synchronization only occurs between an I-Structure producer and consumer operations that are accessing the same memory location; (2) it is fine-grain i.e., it synchronizes at a finer memory granularity than only at the whole data structure level (for instance, it would synchronize at each individual array element instead of barrier synchronization which synchronizes at the data structure level.); (3) it is a lenient (non-strict) synchronization i.e., an I-Structure load can be issued (non-blocking) even before the corresponding I-Structure store is issued/completed. This paper reports a study of I-Structures in the context of modern many-core chip architectures. The major points examined include: • The creation of an I-Structure style design that exploits a lenient synchronization model using a modern many-core architecture the IBM Cyclops-64 architecture. • The implementation and integration of our design in the DEEP emulation system that can simulate the entire Cyclops-64 chip at gate level. This allows us to assess the feasibility of its hardware design and implementation. • The demonstration of the advantages of I-Structure style synchronization especially its lenient synchronization feature on the Cyclops-64 architecture through an experimental case study using wavefront computation. A quantitative comparison to traditional control-flow based synchronization, such as signal-wait, is reported.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Synchronization for a Large-scale Multi-core Chip Architecture

Multi-core architectures are becoming mainstream, permitting increasing on-chip parallelism through hardware support for multithreading. Synchronization, especial finegrain synchronization, is essential to the effective utilization of the computational power of high-performance large-scale multi-core architectures. However, designing and implementing fine-grain synchronization in such architect...

متن کامل

Efficient Fine-Grain Synchronization on a Multi-Core Chip Architecture: A Fresh Look

Multi-core chip architectures are becoming mainstream, permitting increasing on-chip parallelism through hardware support for multithreading. Fine-grain synchronization is essential to the effective utilization of the capacity provided by future high-performance multi-core architectures. However, there are also new challenges realizing such fine-grain synchronization in large-scale multi-core c...

متن کامل

An Efficient Synchronisation Mechanism for Multi-Core Systems

The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single-Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Cons...

متن کامل

Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications

Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...

متن کامل

A Study of Parallel Betweenness Centrality Algorithm on a Manycore Architecture

Large scale graph analysis algorithms–such as those in SCCA2 benchmarks studied in this paper–play an increasingly important role in high performance computing applications. Different from most of traditional scientific computing applications, graph algorithms often show dynamic and irregular computing behavior. It is difficult to attain good performance on large scale conventional parallel arc...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

The Elephant and the Mouse: Non-Strict Fine-Grain Synchronization for Many-Core Architectures

نویسندگان

چکیده

منابع مشابه

Efficient Synchronization for a Large-scale Multi-core Chip Architecture

Efficient Fine-Grain Synchronization on a Multi-Core Chip Architecture: A Fresh Look

An Efficient Synchronisation Mechanism for Multi-Core Systems

Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications

A Study of Parallel Betweenness Centrality Algorithm on a Manycore Architecture

عنوان ژورنال:

اشتراک گذاری